Split-lexicon based hierarchical recognition of speech using syllable and word level acoustic units

نویسندگان

Abhinav Sethy

Shrikanth S. Narayanan

چکیده

Most speech recognition systems, especially LVCSR, use context dependent phones as the basic acoustic unit for recognition. The primary motive for this is the relative ease with which phone based systems can be trained robustly with small amounts of data. However as recent research indicates, significant improvements in recognition accuracy can be gained by using acoustic units of longer duration such as syllables. Syllable and other longer length units provide an efficient way for modeling long term temporal dependencies in speech which are difficult to cover in a phoneme based recognition framework. But these longer duration units suffer from training data sparsity problem since a large number of units in the lexicon will have little or no acoustic training data. In this paper we present a two step approach to address the training data sparsity problem. First we use CD phones to initialize the higher level units in a manner which minimizes the impact of training data sparsity. Subsequently we present methods to split the lexicon into units of different acoustic length based on a analysis of the training data. We present results which show that a 25-30% improvement in terms of word error rate can be acheived by using CD phone initialization and variable length unit selection on a LVCSR task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syllable Speech Recognition Output Post-Processing Based on Models of Acoustics, Phonetics and Lexicon

The paper presents advances in a multi-level automatic speech understanding approach that is initially developed for highly inflective languages with relatively free word order. Two levels are considered. On the first level it is applied a syllablebased grammar phoneme recognizer, which output is postprocessed at the second level. The described model of postprocessing involves acoustic and phon...

متن کامل

Improvements in English Asr for the Malach Project Using Syllable-centric Models

LVCSR systems have traditionally used phones as the basic acoustic unit for recognition. Syllable and other longer length units provide an efficient means for modeling long-term temporal dependencies in speech that are difficult to capture in a phone based recognition framework. However, it is well known that longer duration units suffer from training data sparsity problems since a large number...

متن کامل

Improvements in English Asr for T Syllable-centric

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Morpheme Segmentation and Concatenation Approaches for Uyghur LVCSR

In this paper, various kinds of sub-word lexica are thoroughly investigated under the framework of Uyghur LVCSR system. Experimental results show that it is inefficient to directly model based on word units or small units like morpheme or even syllable units. It is observed that an optimal sub-word unit set between word and morpheme units can better fit for ASR system. In order to select best u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Split-lexicon based hierarchical recognition of speech using syllable and word level acoustic units

نویسندگان

چکیده

منابع مشابه

Syllable Speech Recognition Output Post-Processing Based on Models of Acoustics, Phonetics and Lexicon

Improvements in English Asr for the Malach Project Using Syllable-centric Models

Improvements in English Asr for T Syllable-centric

Allophone-based acoustic modeling for Persian phoneme recognition

Morpheme Segmentation and Concatenation Approaches for Uyghur LVCSR

عنوان ژورنال:

اشتراک گذاری